Skip to content

fix: memory not released after indexing (20GB+ RSS for 5MB data)#833

Closed
fxfxfx123 wants to merge 3 commits into
DeusData:mainfrom
fxfxfx123:fix/mimalloc-abandoned-thread-purge
Closed

fix: memory not released after indexing (20GB+ RSS for 5MB data)#833
fxfxfx123 wants to merge 3 commits into
DeusData:mainfrom
fxfxfx123:fix/mimalloc-abandoned-thread-purge

Conversation

@fxfxfx123

Copy link
Copy Markdown

Problem

After indexing a small project (65 files, 1.3MB, 2509 nodes), codebase-memory-mcp retains 20GB+ RSS on a 32GB Windows machine. Memory grows monotonically and is never released to the OS.

Root Cause (from source code)

  1. mimalloc arenas from exited worker threads not purged - mem.c does not set mi_option_arena_purge_mult or mi_option_page_reclaim_on_free
  2. Default worker count uses all 20 cores, each with its own mimalloc arena + 8MB stack
  3. DEFAULT_RAM_FRACTION=0.5 means 16GB budget, no pressure to release

Fix

File Change Effect
src/foundation/mem.c mi_option_arena_purge_mult=1 Purge arenas aggressively
src/foundation/mem.c mi_option_page_reclaim_on_free=1 Reclaim pages from abandoned worker threads
src/foundation/mem.c DEFAULT_RAM_FRACTION 0.5 to 0.25 Lower memory budget
src/foundation/system_info.c Cap initial workers at 8 Reduce peak memory 60pct

Benchmark

Metric Before After
Threads 23 7
Peak RSS 20GB+ ~2GB
Steady RSS 20GB+ growing 13MB stable
CPU 100pct 1pct

Tested on Windows 11, i7-12700, 32GB RAM, 65-file project.

fxfxfx123 added 2 commits July 4, 2026 11:51
- Set mi_option_arena_purge_mult=1 (default 10) so arenas are purged
  aggressively without extra delay
- Set mi_option_page_reclaim_on_free=1 to reclaim pages from exited
  worker thread heaps
- Lower DEFAULT_RAM_FRACTION from 0.5 to 0.25 to reduce memory budget

Signed-off-by: fxfxfx123 <93531292+fxfxfx123@users.noreply.github.com>
Memory scales linearly with worker count (each gets its own mimalloc
arena + 8MB stack). Diminishing returns past 8 workers. On a 20-core
CPU this reduces peak memory by up to 60% with negligible speed loss.

Signed-off-by: fxfxfx123 <93531292+fxfxfx123@users.noreply.github.com>
@fxfxfx123 fxfxfx123 requested a review from DeusData as a code owner July 4, 2026 03:53
Move inline trailing comments to preceding lines to match project
style and satisfy clang-format-20.

Signed-off-by: fxfxfx123 <93531292+fxfxfx123@users.noreply.github.com>
@DeusData DeusData added bug Something isn't working stability/performance Server crashes, OOM, hangs, high CPU/memory windows Windows-specific issues priority/high Needs near-term maintainer attention; high-impact bug, regression, safety issue, or release blocker. labels Jul 4, 2026
@DeusData

DeusData commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Thanks for the Windows memory-retention fix for #832. Triage: high-priority stability/performance PR.

Review will check the mimalloc options and worker-count change separately: we need post-index RSS to return to sane levels, but we also need to avoid an over-broad throughput regression or conflicting with the explicit memory-budget work in #752/#685. Please keep the before/after memory evidence current in the PR.

@fxfxfx123

Copy link
Copy Markdown
Author

Yes, still active and waiting for review. All CI checks pass. Happy to adjust if there are concerns about the worker-count change — the mimalloc arena purge (mem.c) is the core fix, and the worker cap (system_info.c) can be dropped if it conflicts with #752/#685.

@DeusData

DeusData commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Quick status note: this PR is one of four open memory/RAM-policy changes (#833, #752, #586, #685) that we've reviewed individually and found genuinely complementary — so rather than merging them piecemeal, we're doing a combined design pass over the whole memory policy (explicit override, host-tiered defaults, retention bounds, post-index release, and the Windows auto-sync driver in #841) and will respond here with a concrete direction shortly. Your work is very much part of that plan — thanks for your patience!

@DeusData

DeusData commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Thank you — your diagnosis found the right lever. The core of the #832 fix is mi_option_page_reclaim_on_free=1: mimalloc v3 flipped the reclaim default so a long-lived thread no longer reclaims pages abandoned by exited worker threads → the RSS ratchet you reported. That exact option is folded into the keystone memory fix, merged as 7884ccb (PR #853), with you credited. The other hunks were dropped after analysis: arena_purge_mult=1 is inert at our purge_delay=0, the DEFAULT_RAM_FRACTION change was dead code (all call sites pass an explicit fraction), and the worker-cap belonged to a different layer (host-tiered budgets, since landed as #752).

Being straight with you: #832 isn't fully closed yet. The keystone also routes the two background index paths through a subprocess (so the kernel returns 100% of each cycle's RSS on exit), and #854 added budget-derived retention — so 'RSS won't come back' is fixed. But the trigger you hit on Windows (the watcher re-indexing on every poll even when nothing changed) is tracked separately as #841 and still to come. So I'll leave #832 open until that lands, but your page_reclaim finding is shipped. Really appreciate the sharp report — closing this PR in favor of the folded fix.

@DeusData DeusData closed this Jul 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working priority/high Needs near-term maintainer attention; high-impact bug, regression, safety issue, or release blocker. stability/performance Server crashes, OOM, hangs, high CPU/memory windows Windows-specific issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants